Clause Boundary Detection in Transcribed Spoken Language

نویسنده

  • Fredrik Jørgensen
چکیده

We argue that finite clauses should be regarded as the basic unit in syntactic analysis of spoken language, and describe a method that automatically detects clause boundaries by classifying coordinating conjunctions in spoken language discourse as belonging to either the syntactic level or the discourse level of analysis. The method exploits the special role that coordinating conjunctions play in organizing spoken language discourse, and that coordinating conjunctions at discourse level mark clause boundaries.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Shallow Parsing of Spoken Estonian Using Constraint Grammar

In this paper we describe how we have adapted the syntactic analyzer of written Estonian to the spoken language. The Constraint Grammar shallow syntactic parser (Müürisep et al. 2003) was used for the automatic syntactic analysis of the corpus of Estonian spoken language (Hennoste et al. 2000). To adapt the parser, the clause boundary detection rules as well as some syntactic constraints had to...

متن کامل

Dependency parsing of Japanese spoken monologue based on clause-starts detection

A dependency parsing method based on sentence segmentation into clauses has been proposed and confirmed to be effective. In this method, dependency parsing is executed in two stages: at the clause level and the sentence level. However, since a sentence can not be segmented into complete clauses, in the past research, a unit sandwiched between two clause-end boundaries (clause boundary unit) was...

متن کامل

Relative Importance in English and Persian: Thematization or Tonic Prominence?

There are two common ways to assign relative importance in spoken language: tonic prominence and thematization. The former is expressing the main points of information units in speech (Halliday, 1994), and the latter is putting an element at the beginning of a clause. This study explores how relative importance is realized in English and Persian. It also investigates how advanced Persian learne...

متن کامل

Robust clause boundary identification for corpus annotation

The paper describes a rule-based system for tagging clause boundaries, implemented for annotating the Estonian Reference Corpus of the University of Tartu, a collection of written texts containing ca 245 million running words and available for querying via Keeleveeb language portal. The system needs information about parts of speech and grammatical categories coded in the word-forms, i.e. it ta...

متن کامل

Incremental dependency parsing of Japanese spoken monologue based on clause boundaries

In applications of spoken monologue processing such as simultaneous machine interpretation and real-time captions generation, incremental language parsing is strongly required. This paper proposes a technique for incremental dependency parsing of Japanese spoken monologue on a clause-by-clause basis. The technique identifies the clauses based on clause boundaries analysis, analyzes the dependen...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007